Goto

Collaborating Authors

 invariant region


Re-Attentional Controllable Video Diffusion Editing

Wang, Yuanzhi, Li, Yong, Liu, Mengyi, Zhang, Xiaoya, Liu, Xin, Cui, Zhen, Chan, Antoni B.

arXiv.org Artificial Intelligence

Editing videos with textual guidance has garnered popularity due to its streamlined process which mandates users to solely edit the text prompt corresponding to the source video. Recent studies have explored and exploited large-scale text-to-image diffusion models for text-guided video editing, resulting in remarkable video editing capabilities. However, they may still suffer from some limitations such as mislocated objects, incorrect number of objects. Therefore, the controllability of video editing remains a formidable challenge. In this paper, we aim to challenge the above limitations by proposing a Re-Attentional Controllable Video Diffusion Editing (ReAtCo) method. Specially, to align the spatial placement of the target objects with the edited text prompt in a training-free manner, we propose a Re-Attentional Diffusion (RAD) to refocus the cross-attention activation responses between the edited text prompt and the target video during the denoising stage, resulting in a spatially location-aligned and semantically high-fidelity manipulated video. In particular, to faithfully preserve the invariant region content with less border artifacts, we propose an Invariant Region-guided Joint Sampling (IRJS) strategy to mitigate the intrinsic sampling errors w.r.t the invariant regions at each denoising timestep and constrain the generated content to be harmonized with the invariant region content. Experimental results verify that ReAtCo consistently improves the controllability of video diffusion editing and achieves superior video editing performance.


One-Shot Imitation Learning with Invariance Matching for Robotic Manipulation

Zhang, Xinyu, Boularias, Abdeslam

arXiv.org Artificial Intelligence

Learning a single universal policy that can perform a diverse set of manipulation tasks is a promising new direction in robotics. However, existing techniques are limited to learning policies that can only perform tasks that are encountered during training, and require a large number of demonstrations to learn new tasks. Humans, on the other hand, often can learn a new task from a single unannotated demonstration. In this work, we propose the Invariance-Matching One-shot Policy Learning (IMOP) algorithm. In contrast to the standard practice of learning the end-effector's pose directly, IMOP first learns invariant regions of the state space for a given task, and then computes the end-effector's pose through matching the invariant regions between demonstrations and test scenes. Trained on the 18 RLBench tasks, IMOP achieves a success rate that outperforms the state-of-the-art consistently, by 4.5% on average over the 18 tasks. More importantly, IMOP can learn a novel task from a single unannotated demonstration, and without any fine-tuning, and achieves an average success rate improvement of $11.5\%$ over the state-of-the-art on 22 novel tasks selected across nine categories. IMOP can also generalize to new shapes and learn to manipulate objects that are different from those in the demonstration. Further, IMOP can perform one-shot sim-to-real transfer using a single real-robot demonstration.


Binary Classification as a Phase Separation Process

Monteiro, Rafael

arXiv.org Machine Learning

We propose a new binary classification model called Phase Separation Binary Classifier (PSBC). It consists of a discretization of a nonlinear reaction-diffusion equation coupled with an ODE, and is inspired by fluid behavior, namely, on how binary fluids phase separate. Hence, parameters and hyperparameters have physical meaning, whose effects are carefully studied in several different scenarios. PSBC's coefficients are trainable weights, chosen according to a minimization problem using Gradient Descent; optimization relies on a classical Backpropagation with weight sharing. The model can be seen under the framework of feedforward networks, and is endowed with a nonlinear activation function that is linear in trainable weights but polynomial in other variables, yielding a cost function that is also polynomial. In view of the model's connection with ODEs and parabolic PDEs, forward propagation amounts to an initial value problem. Thus, stability conditions are established using the concept of Invariant regions. Interesting model compression properties are thoroughly discussed. We illustrate the classifier's qualities by applying it to the subset of numbers "0" and "1" of the classical MNIST database, where we are able to discern individuals with more than 94\% accuracy, sometimes using less only about 10\% of variables.